tags:
- SQL
Lecture 1 - Relating (NC)
本节课,我们会接着上节课的内容,你将进一步了解到什么叫关系型数据库。后面,我们还会做一些对第一节 Querying 的延申,本节,我们会学习 nested querying 。
.tables
This command returns the names of the tables in longlist.db
— 7 in all.longlist.db
and try to imagine relationships between them. Some examples are:
authors
and books
tables with the author name and book title columns!authors
table would give us no information about the books written by that author.authors
table will always correspond to the first row in the books
table. The problem with this system is that one may make a mistake (add a book but forget to add its corresponding author, or vice versa). Also, an author may have written more than one book or a book may be co-written by multiple authors.longlist.db
.
AuthorBookPublisherTranslatorRatingwrotepublishedtranslatedhasIf we have some database, how do we know the relationships among the entities stored inside of it?
Once we know that a relationship exists between certain entities, how do we implement that in our database?
books
table is now a column in the ratings
table. This helps form a one-to-many relationship between the two tables — a book with a title (found in the books
table) can have multiple ratings (found in the ratings
table).books
and ratings
entities. Here’s an example of a many-to-many relationship.There is now a table called authored
that maps the primary key of books
(book_id
) to the primary key of authors
(author_id
).
Can the IDs of the author and the book be the same? For example, if
author_id
is 1 andbook_id
is also 1 in theauthored
table, will there be a mix-up?
authored
are called “joint” or “junction” tables. In such tables, we usually know which primary key is referenced by which column. In this case, since we know that the first column contains the primary key of authors
only and the second column similarly contains the primary key of books
only, it would be okay even if the values matched!If we have a lot of joint tables like this, wouldn’t that take up too much space?
On changing the ID of a book or author, does the ID get updated in the other tables as well?
books
table, we have an ID to indicate the publisher, which is a foreign key taken from the publishers
table. To find out the books published by Fitzcarraldo Editions, we would need two queries — one to find out the publisher_id
of Fitzcarraldo Editions from the publishers
table and the second, to use this publisher_id
to find all the books published by Fitzcarraldo Editions. These two queries can be combined into one using the idea of a subquery.
SELECT "title"
FROM "books"
WHERE "publisher_id" = (
SELECT "id"
FROM "publishers"
WHERE "publisher" = 'Fitzcarraldo Editions'
);
Notice that:
SELECT "rating"
FROM "ratings"
WHERE "book_id" = (
SELECT "id"
FROM "books"
WHERE "title" = 'In Memory of Memory'
);
SELECT AVG("rating")
FROM "ratings"
WHERE "book_id" = (
SELECT "id"
FROM "books"
WHERE "title" = 'In Memory of Memory'
);
books
, authors
and authored
.
SELECT "name"
FROM "authors"
WHERE "id" = (
SELECT "author_id"
FROM "authored"
WHERE "book_id" = (
SELECT "id"
FROM "books"
WHERE "title" = 'Flights'
)
);
The first query that is run is the most deeply nested one — finding the ID of the book Flights. Then, the ID of the author(s) who wrote Flights is found. Last, this is used to retrieve the author name(s).IN
IN
keyword as follows.
SELECT "title"
FROM "books"
WHERE "id" IN (
SELECT "book_id"
FROM "authored"
WHERE "author_id" = (
SELECT "id"
FROM "authors"
WHERE "name" = 'Fernanda Melchor'
)
);
Note that the innermost query uses =
and not the IN
operator. This is because we expect to find just one author named Fernanda Melchor.What if the value of an inner query is not found?
Is it necessary to use four spaces to indent a subquery?
How can we implement a many-to-one relationship between tables?
authored
table with multiple entries for the same book ID. Each of these entries would have a different author ID. It is worth noting that foreign key values can be repeated within a table, but primary key values are always unique.JOIN
JOIN
works, consider a database of sea lions and their migration patterns. Here is a snapshot of the database.sea lions
and migrations
together such that each sea lion also has its corresponding information as an extension of the same row.longlist.db
using the .quit
SQLite command. Then, open up sea_lions.db
.SELECT *
FROM "sea_lions"
JOIN "migrations" ON "migrations"."id" = "sea_lions"."id";
Notice that:
- The ON
keyword is used to specify which values match between the tables being joined. It is not possible to join tables without matching values.
- If there are any IDs in one table not present in the other, this row will not be present in the joined table. This kind of join is called an INNER JOIN
.
LEFT JOIN
, RIGHT JOIN
and FULL JOIN
. Each of these is a kind of OUTER JOIN
.LEFT JOIN
prioritizes the data in the left (or first) table.先outer join,然后在根据left或者right
SELECT *
FROM "sea_lions"
LEFT JOIN "migrations" ON "migrations"."id" = "sea_lions"."id";
This query would retain all sea lion data from the sea_lions
table — the left one. Some rows in the joined table could be partially blank. This would happen if the right table didn’t have data for a particular ID.
RIGHT JOIN
retains all the rows from the right (or second) table. A FULL JOIN
allows us to see the entirety of all tables.OUTER JOIN
could lead to empty or NULL
values in the joined table.id
. Since the value on which we are joining the tables has the same column name in both tables, we can actually omit the ON
section of the query while joining.SELECT *
FROM "sea_lions"
NATURAL JOIN "migrations";
Notice that the result does not have a duplicate id
column in this case. Also, this join works similarly to an INNER JOIN
.
In the sea lions database, how are the IDs created? Do they come from the
sea_lions
table or themigrations
table?
If we are trying to join three tables, how can we know which the left or right tables are?
JOIN
statement, the first table before the keyword is the left one. The one that is involved in the JOIN
keyword is the right table.When we join tables, does the resulting joined table get saved? Can we reference it later without joining again?
JOIN
, the result is a temporary table or a result set. It can be used for the duration of the query.There’s many different kinds of
JOIN
. Is there a default one we should use?
JOIN
— is actually an INNER JOIN
and that’s the default for SQL.longlist.db
.INTERSECT
operator to find this set.
SELECT "name" FROM "translators"
INTERSECT
SELECT "name" FROM "authors";
SELECT "name" FROM "translators"
UNION
SELECT "name" FROM "authors";
Notice that every author and every translator is included in this result set, but only once!SELECT 'author' AS "profession", "name"
FROM "authors"
UNION
SELECT 'translator' AS "profession", "name"
FROM "translators";
EXCEPT
keyword can be used to find such a set. In other words, the set of translators is subtracted from the set of authors to form this one.
SELECT "name" FROM "authors"
EXCEPT
SELECT "name" FROM "translators";
We can verify that no author-translator from the intersection set appears in this result set.EXCEPT
.SELECT "book_id" FROM "translated"
WHERE "translator_id" = (
SELECT "id" from "translators"
WHERE "name" = 'Sophie Hughes'
)
INTERSECT
SELECT "book_id" FROM "translated"
WHERE "translator_id" = (
SELECT "id" from "translators"
WHERE "name" = 'Margaret Jull Costa'
);
Each of the nested queries here finds the IDs of the books for one translator. The INTERSECT
keyword is used to intersect the resulting sets and give us the books they have collaborated on.Could we use
INTERSECT
,UNION
etc. to perform operations on 3-4 sets?
INTERSECT
operator twice. An important note — we have to make sure to have the same number and same types of columns in the sets to be combined using INTERSECT
, UNION
etc.ratings
table. For each book, we want to find the average rating of the book. To do this, we would first need to group ratings together by book and then average the ratings out for each book (each group).
SELECT "book_id", AVG("rating") AS "average rating"
FROM "ratings"
GROUP BY "book_id";
In this query, the GROUP BY
keyword was used to create groups for each book and then collapse the ratings of the group into an average rating!SELECT "book_id", ROUND(AVG("rating"), 2) AS "average rating"
FROM "ratings"
GROUP BY "book_id"
HAVING "average rating" > 4.0;
Note that the HAVING
keyword is used here to specify a condition for the groups, instead of WHERE
(which can only be used to specify conditions for individual rows).Is it possible to see the number of ratings given to each book?
COUNT
keyword.
SELECT "book_id", COUNT("rating")
FROM "ratings"
GROUP BY "book_id";
Is it also possible to sort the data obtained here?
SELECT "book_id", ROUND(AVG("rating"), 2) AS "average rating"
FROM "ratings"
GROUP BY "book_id"
HAVING "average rating" > 4.0
ORDER BY "average rating" DESC;